Cross-Language Source Code Re-Use Detection Using Latent Semantic Analysis

نویسندگان

  • Enrique Flores
  • Alberto Barrón-Cedeño
  • Lidia Moreno
  • Paolo Rosso
چکیده

Nowadays, Internet is the main source to get information from blogs, encyclopedias, discussion forums, source code repositories, and more resources which are available just one click away. The temptation to re-use these materials is very high. Even source codes are easily available through a simple search on the Web. There is a need of detecting potential instances of source code re-use. Source code re-use detection has usually been approached comparing source codes in their compiled version. When dealing with cross-language source code re-use, traditional approaches can deal only with the programming languages supported by the compiler. We assume that a source code is a piece of text ,with its syntax and structure, so we aim at applying models for free text re-use detection to source code. In this paper we compare a Latent Semantic Analysis (LSA) approach with previously used text re-use detection models for measuring cross-language similarity in source code. The LSA-based approach shows slightly better results than the other models, being able to distinguish between re-used and related source codes with a high performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PAN@FIRE: Overview of CL-SOCO Track on the Detection of Cross-Language SOurce COde Re-use

The detection of source code re-use is an important research field for both software industry and academia fields. This paper summarizes the goals, organization and results of the second SOCO competitive evaluation campaign for systems that automatically detect the source code re-use phenomenon. PAN@FIRE shared task, named Cross-Language SOurce COde Re-use (CL-SOCO), focused on the detection of...

متن کامل

Using latent semantic analysis to identify similarities in source code to support program understanding

The paper describes the results of applying Latent Semantic Analysis (LSA), an advanced information retrieval method, to program source code and associated documentation. Latent Semantic Analysis is a corpus-based statistical method for inducing and representing aspects of the meanings of words and passages (of natural language) reflective in their usage. This methodology is assessed for applic...

متن کامل

Measuring MT Adequacy Using Latent Semantic Analysis

Translation adequacy is defined as the amount of semantic content from the source language document that is conveyed in the target language document. As such, it is more difficult to measure than intelligibility since semantic content must be measured in two documents and then compared. Latent Semantic Analysis is a content measurement technique used in language learner evaluation that exhibits...

متن کامل

Automatic Software Clustering via Latent Semantic Analysis

1 This paper appears in the 14 IEEE ASE’99, Cocoa Beach FL, Oct. 12-15, pp. 251-254 Abstract The paper describes the initial results of applying Latent Semantic Analysis (LSA) to program source code and associated documentation. Latent Semantic Analysis is a corpus-based statistical method for inducing and representing aspects of the meanings of words and passages (of natural language) reflecti...

متن کامل

GnuTutor: An Open Source Intelligent Tutoring System Based on AutoTutor

This paper presents GnuTutor, an open source intelligent tutoring system (ITS) inspired by the AutoTutor ITS. The goal of GnuTutor is to create a freely available, open source ITS platform that can be used by schools and researchers alike. To achieve this goal, significant departures from AutoTutor’s current design were made so that GnuTutor would use a smaller, non-proprietary code base but ha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. UCS

دوره 21  شماره 

صفحات  -

تاریخ انتشار 2015